Note: I worked with Cayce Hook and MH Tessler on this problem set.

This is problem set #2, in which we hope you will practice the visualization package ggplot2, as well as hone your knowledge of the packages tidyr and dplyr.

Part 1: Basic intro to ggplot

Part 1A: Exploring ggplot2 using qplot

Note, that this example is from the_grammar.R on http://had.co.nz/ggplot2 I’ve adapted this for psych 254 purposes

First install and load the package.

#install.packages("ggplot2")
library(ggplot2)
setwd("~/Documents/STANFORD PHD/Psych 254 Winter 2014/ProblemSets/analyses")

Now we’re going to use qplot. qplot is the easy interface, meant to replace plot. You can give it simple qplot(x,y) examples, or slightly more complex examples like qplot(x, y, col=grp, data=d).

We’re going to be using the diamonds dataset. This is a set of measurements of diamonds, along with their price etc.

head(diamonds)
##   carat       cut color clarity depth table price    x    y    z
## 1  0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43
## 2  0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31
## 3  0.23      Good     E     VS1  56.9    65   327 4.05 4.07 2.31
## 4  0.29   Premium     I     VS2  62.4    58   334 4.20 4.23 2.63
## 5  0.31      Good     J     SI2  63.3    58   335 4.34 4.35 2.75
## 6  0.24 Very Good     J    VVS2  62.8    57   336 3.94 3.96 2.48
qplot(diamonds$carat, diamonds$price)

plot of chunk unnamed-chunk-2

Scatter plots are trivial, and easy to add features to. Modify this plot so that it uses the dataframe rather than working from variables in the general namespace (good to get away from retyping diamonds$ every time you reference a variable).

qplot(carat, price, data = diamonds)

plot of chunk Add diamonds as the dataframe

Try adding clarity and cut, using shape and color as your visual variables.

qplot(carat, price, color = clarity, shape = cut, data = diamonds)

plot of chunk add clarity and cut as shape and color

One of the primary benefits of ggplot2 is the use of facets - also known as small multiples in the Tufte vocabulary. That last plot was probably hard to read. Facets could make it better. Try adding a facets = x ~ y argument. x ~ y means row facets are by x, column facets by y.

qplot(carat, price, color = clarity, shape = cut, facets = clarity ~ cut, data = diamonds)

plot of chunk unnamed-chunk-3

But facets can also get overwhelming. Try to strike a good balance between color, shape, and faceting.

HINT: facets = . ~ x puts x on the columns, but facets = ~ x (no dot) wraps the facets. These are underlying calls to different functions, facet_wrap (no dot) and facet_grid (two arguments).

qplot(carat, price, color = clarity, facets = . ~ cut, data = diamonds)

plot of chunk unnamed-chunk-4

The basic unit of a ggplot plot is a “geom” - a mapping between data (via an “aesthetic”) and a particular geometric configuration on coordinate axes.

Let’s try some other geoms and manipulate their parameters. First, try a histogram (geom="hist").

qplot(carat, fill = clarity, data = diamonds)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-5

#I tried setting geom to "hist", but it returned the following error: "No geom called hist", so I called qplot with only the x-variable defined.

Now facet your histogram by clarity and cut.

qplot(carat, fill = clarity, facets = clarity ~ cut, data = diamonds)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-6

I like a slightly cleaner look to my plots. Luckily, ggplot allows you to add “themes” to your plots. Try doing the same plot but adding + theme_bw() or + theme_classic(). Different themes work better for different applications, in my experience.

#Below I try `theme_bw()` on the histogram of carats
qplot(carat, fill = clarity, facets = clarity ~ cut, data = diamonds) + 
  theme_bw()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-7

#Next I try `theme_classic()` on the histogram of carats
qplot(carat, fill = clarity, facets = clarity ~ cut, data = diamonds) +
  theme_classic()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-7

Part 1B: Exploring ggplot2 using ggplot

ggplot is just a way of building qplot calls up more systematically. It’s sometimes easier to use and sometimes a bit more complicated. What I want to show off here is the functionality of being able to build up complex plots with multiple elements. You can actually do this using qplot pretty easily, but there are a few things that are hard to do.

ggplot is the basic call, where you specify A) a dataframe and B) an aesthetic mapping from variables in the plot space to variables in the dataset.

d <- ggplot(diamonds, aes(x=carat, y=price)) # first you set the aesthetic and dataset
d + geom_point() # then you add geoms

plot of chunk unnamed-chunk-8

d + geom_point(aes(colour = carat)) # and you can keep doing this to add layers to the plot

plot of chunk unnamed-chunk-8

Try writing this as a single set of additions (e.g. one line of R code, though you can put in linebreaks). This is the most common workflow for me.

ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point(aes(colour = carat))

plot of chunk unnamed-chunk-9

You can also set the aesthetic separately for each geom, and make some great plots this way. Though this can get complicated. Try using ggplot to build a histogram of prices.

ggplot(diamonds, aes(x=price)) + 
  geom_histogram(aes(fill = clarity)) +
  facet_grid(clarity ~ cut) +
  theme_bw()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-10

Part 2: Diving into real data: Sklar et al. (2012)

Sklar et al. (2012) claims evidence for unconscious arithmetic processing. We’re going to do a reanalysis of their Experiment 6, which is the primary piece of evidence for that claim. The data are generously contributed by Asael Sklar.

First let’s set up a few preliminaries.

library(tidyr)
## Warning: package 'tidyr' was built under R version 3.1.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.1.2
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
sem <- function(x) {sd(x) / sqrt(length(x))}
ci95 <- function(x) {sem(x) * 1.96}

Data Prep

First read in two data files and subject info. A and B refer to different trial order counterbalances.

subinfo <- read.csv("../data/sklar_expt6_subinfo_corrected.csv")
d.a <- read.csv("../data/sklar_expt6a_corrected.csv")
d.b <- read.csv("../data/sklar_expt6b_corrected.csv")

Gather these datasets into long form and get rid of the Xs in the headers.

d.a.tidy = d.a %>%
  gather(subid, RT, starts_with("X"))

d.b.tidy = d.b %>%
  gather(subid, RT, starts_with("X"))

#I remove the Xs below

Bind these together. Check out bind_rows.

d.ab.tidy = bind_rows(d.a.tidy, d.b.tidy) 
## Warning: Unequal factor levels: coercing to character
#Below I remove the X's from the subid and re-code subid as a number
d.ab.tidy = d.ab.tidy %>% 
  mutate(subid = as.integer(gsub("X", "", subid)))

Merge these with subject info. You will need to look into merge and its relatives, left_join and right_join. Call this dataframe d, by convention.

#Merge data with subject info
d.full = right_join(d.ab.tidy, subinfo, by = "subid")

#I reorder the columns, so that all the subject info comes before the RT
d = select(d.full, subid, prime, prime.result, target, congruent, operand, distance, counterbalance, presentation.time, subjective.test, objective.test, RT)

Clean up the factor structure.

d$presentation.time <- factor(d$presentation.time)
levels(d$operand) <- c("addition","subtraction")
d$subid <- factor(d$subid)
d$counterbalance <- factor(d$counterbalance)
str(d)
## Classes 'tbl_df', 'tbl' and 'data.frame':    6468 obs. of  12 variables:
##  $ subid            : Factor w/ 42 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ prime            : Factor w/ 154 levels "=1+2+5","=1+2+9",..: 1 3 5 8 10 11 12 14 15 17 ...
##  $ prime.result     : int  8 9 8 10 12 13 12 11 12 13 ...
##  $ target           : int  9 11 12 12 11 12 11 10 11 9 ...
##  $ congruent        : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ operand          : Factor w/ 2 levels "addition","subtraction": 1 1 1 1 1 1 1 1 1 1 ...
##  $ distance         : int  -1 -2 -4 -2 1 1 1 1 1 4 ...
##  $ counterbalance   : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ presentation.time: Factor w/ 2 levels "1700","2000": 1 1 1 1 1 1 1 1 1 1 ...
##  $ subjective.test  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ objective.test   : num  0.587 0.587 0.587 0.587 0.587 ...
##  $ RT               : int  597 699 700 628 768 595 664 803 767 700 ...

Data Analysis Preliminaries

Examine the basic properties of the dataset. First, take a histogram.

ggplot(d, aes(x = RT)) + 
  geom_histogram()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-17

Challenge question: what is the sample rate of the input device they are using to gather RTs?

#I did not have a chance to answer this challenge question.

Sklar et al. did two manipulation checks. Subjective - asking participants whether they saw the primes - and objective - asking them to report the parity of the primes (even or odd) to find out if they could actually read the primes when they tried. Examine both the unconscious and conscious manipulation checks (this information is stored in subinfo). What do you see? Are they related to one another?

ggplot(subinfo, aes(x = objective.test)) + 
  geom_histogram() + 
  facet_grid(~subjective.test)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-19

qplot(subid, objective.test, color = subjective.test, facets = . ~ subjective.test, data = subinfo)

plot of chunk unnamed-chunk-19

#From these plots above, it appears that subjects who report seeing the prime, on average perform higher on the objective test.

r = lm(objective.test ~ subjective.test, data = d)
summary(r)
## 
## Call:
## lm(formula = objective.test ~ subjective.test, data = d)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3162 -0.1130 -0.0032  0.0802  0.3791 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.54276    0.00261     208   <2e-16 ***
## subjective.test  0.21089    0.00370      57   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.149 on 6466 degrees of freedom
## Multiple R-squared:  0.335,  Adjusted R-squared:  0.335 
## F-statistic: 3.25e+03 on 1 and 6466 DF,  p-value: <2e-16
#A linear regression testing whether performance on the subjective test predicts performance on the objective test reveals a significant relationship between these two manipulation checks: If subjects pass the subjective test, they are more likely to perform higher on the objective test (b =0.21, p < 0.001). In other words, as concluded from the plots above, subjects who report seeing the prime on average perform higher on the objective test than subjects who claim they did not see the prime.

OK, let’s turn back to the measure and implement Sklar et al.’s exclusion criterion. You need to have said you couldn’t see (subjective test) and also be not significantly above chance on the objective test (< .6 correct). Call your new data frame ds.

ds = d %>%
  filter(subjective.test == 0, objective.test < 0.6)

#View(ds)

#Below I am just checking that the filter did what I intended.
summarise(ds, max = max(subjective.test))
## Source: local data frame [1 x 1]
## 
##   max
## 1   0
summarise(ds, max = max(objective.test))
## Source: local data frame [1 x 1]
## 
##      max
## 1 0.5873

Sklar et al.’s analysis

Sklar et al. show a plot of a “facilitation effect” - the time to respond to incongruent primes minus the time to respond to congruent primes. They then show plot this difference score for the subtraction condition and for the two presentation times they tested. Try to reproduce this analysis.

HINT: first take averages within subjects, then compute your error bars across participants, using the sem function (defined above).

ds_summary = ds %>% 
  group_by(subid, presentation.time, operand, congruent) %>%
  summarise(avg = mean(RT, na.rm = T)) %>%
  spread(congruent, avg) %>% 
  mutate(diff = no - yes) %>%
  group_by(operand, presentation.time) %>%
  summarise(avg = mean(diff), err = sem(diff))

ds_summary
## Source: local data frame [4 x 4]
## Groups: operand
## 
##       operand presentation.time     avg   err
## 1    addition              1700 -13.929 8.902
## 2    addition              2000   4.026 5.022
## 3 subtraction              1700  20.999 6.275
## 4 subtraction              2000   9.976 4.447

Now plot this summary, giving more or less the bar plot that Sklar et al. gave (though I would keep operation as a variable here. Make sure you get some error bars on there (e.g. geom_errorbar or geom_linerange).

library(ggplot2)
sklar_plot = ggplot(data = ds_summary, aes(x = presentation.time, y = avg, fill = presentation.time)) 

sklar_plot + 
  geom_bar(stat = "identity") + facet_wrap(~operand) +  
  geom_errorbar(width = .1, aes(ymin = avg - err, ymax = avg + err))
## Warning: Stacking not well defined when ymin != 0

plot of chunk unnamed-chunk-22

What do you see here? How close is it to what Sklar et al. report? Do the error bars match? How do you interpret these data?

Response: The average facilitation effects for the subtraction primes for each of the presentation times appear to match the averages reported and plotted in the Sklar et al. paper. However, my error bars do not match. It appears that Sklar et al. plotted half of the standard error above and below the mean rather than plotting one full standard error above and below the mean.

Challenge problem: verify Sklar et al.’s claim about the relationship between RT and the objective manipulation check.

qplot(objective.test, RT, data = ds)
## Warning: Removed 87 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-23

cor.test(ds$objective.test, ds$RT, use = pairwise.complete.obs)
## 
##  Pearson's product-moment correlation
## 
## data:  ds$objective.test and ds$RT
## t = 2.035, df = 2529, p-value = 0.04193
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.001478 0.079274
## sample estimates:
##     cor 
## 0.04044

From the plot, there does not appear to be a relationship between RT and the objective manipulation. The correlation coefficient between these variables is also low (r = 0.04), suggesting there is no relationship between participants’ performance on the objective manipulation and their reaction times.

Your own analysis

Show us what you would do with these data, operating from first principles. What’s the fairest plot showing a test of Sklar et al.’s original hypothesis that people can do arithmetic “non-consciously”?

#First, I would look at the data without excluding any participants. I would also use confidence intervals rather than standard error bars.
d_summary = d %>% 
  group_by(subid, presentation.time, operand, congruent) %>%
  summarise(avg = mean(RT, na.rm = T)) %>%
  spread(congruent, avg) %>% 
  mutate(diff = no - yes) %>%
  group_by(operand, presentation.time) %>%
  summarise(avg = mean(diff), err = sem(diff), c.95 = ci95(diff))

ggplot(data = d_summary, aes(x = presentation.time, y = avg, fill = presentation.time)) + 
  geom_bar(stat="identity") + 
  facet_wrap(~operand) +  
  geom_errorbar(width=.1, aes(ymin=avg-c.95, ymax=avg+c.95)) +
  theme_bw()
## Warning: Stacking not well defined when ymin != 0

plot of chunk unnamed-chunk-24

#I am somewhat concerned with the authors' exclusion criteria because such a high number of subjects are excluded from the analysis. Below, I first only exclude subjects who passed the subjective test. Second, I only exclude subjects who scored higher than .6 on the objective test. I compare the plots of the data using these exclusion criteria.

ds_st = d %>%
  filter(subjective.test == 0)

ds_st_summary = ds_st %>% 
  group_by(subid, presentation.time, operand, congruent) %>%
  summarise(avg = mean(RT, na.rm = T)) %>%
  spread(congruent, avg) %>% 
  mutate(diff = no - yes) %>%
  group_by(operand, presentation.time) %>%
  summarise(avg = mean(diff), err = sem(diff), c.95 = ci95(diff))

ggplot(data = ds_st_summary, aes(x = presentation.time, y = avg, fill = presentation.time)) + 
  geom_bar(stat="identity") + 
  facet_wrap(~operand) +  
  geom_errorbar(width=.1, aes(ymin=avg-c.95, ymax=avg+c.95)) +
  theme_bw()
## Warning: Stacking not well defined when ymin != 0

plot of chunk unnamed-chunk-24

ds_ot = d %>%
  filter(objective.test < 0.6)

ds_ot_summary = ds_ot %>% 
  group_by(subid, presentation.time, operand, congruent) %>%
  summarise(avg = mean(RT, na.rm = T)) %>%
  spread(congruent, avg) %>% 
  mutate(diff = no - yes) %>%
  group_by(operand, presentation.time) %>%
  summarise(avg = mean(diff), err = sem(diff), c.95 = ci95(diff))

ggplot(d = ds_ot_summary, aes(x = presentation.time, y = avg, fill = presentation.time)) + 
  geom_bar(stat="identity") + 
  facet_wrap(~operand) +  
  geom_errorbar(width=.1, aes(ymin=avg-c.95, ymax=avg+c.95))
## Warning: Stacking not well defined when ymin != 0

plot of chunk unnamed-chunk-24

Visualizing the data with different exclusion criteria, reveals that the results Sklar et al. found are largely driven by the specific exclusion criteria they use. If you use only one of the criteria they implement (i.e., only inluding subjects who fail the subjective test OR only including subjects who score less than 0.6 on the objective test), the facilitation effect disappears for the 2000ms presentation-time and decreases for the 1700ms presentation-time. Also, using confidence intervals instead of standar-error bars reveals that the effects Sklar et al. find are smaller than implied by the figure in the paper.

#I also made this plot. It is not easy to read and did not turn out to be that informative. I made it because I was curious to see if there were any differences among subjects in ow they responded to incongruent vs. congruent primes.

ggplot(ds, aes(x = RT, fill = congruent)) + 
  geom_histogram(position = position_dodge()) + 
  facet_wrap(operand ~ subid)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

plot of chunk unnamed-chunk-25

Challenge problem: Do you find any statistical support for Sklar et al.’s findings?

#I used Sklar et al.'s exclusion criteria in the analysis below.
library(lme4)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## 
## The following object is masked from 'package:tidyr':
## 
##     expand
## 
## Loading required package: Rcpp
r2 = ds %>% 
  filter(operand == 'subtraction') %>%
  lmer(RT ~ congruent + (1 + congruent|subid), .)

summary(r2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: RT ~ congruent + (1 + congruent | subid)
##    Data: .
## 
## REML criterion at convergence: 14621
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -4.333 -0.638 -0.090  0.588  5.069 
## 
## Random effects:
##  Groups   Name         Variance Std.Dev. Corr 
##  subid    (Intercept)  12458.7  111.62        
##           congruentyes    13.4    3.66   -1.00
##  Residual               9463.1   97.28        
## Number of obs: 1214, groups:  subid, 17
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)    677.06      27.36   24.75
## congruentyes   -15.06       5.65   -2.66
## 
## Correlation of Fixed Effects:
##             (Intr)
## congruentys -0.256

The study is a repeated measures design, so I fit the data with a linear mixed model with congruence as the predictor (fixed) variable and reaction time as the outcome variable. I also included a random intercept and slope for subject. The model reveals that the congruence of the prime does affect reaction time: participants who saw a congruent prime were significantly faster at responding than participants who saw an incongruent prime (b = -15.06, t = -2.663). The variance of subject (12458.74) is quite large compared to the residual variance (9463.11); thus the random effect of intercept for subjects is accounting for a substantial amount of the variance in our model. The variance of slope (13.42), however, is small compared to the residual variance, but this is likely because the correlation between the random effects of intercept and slope is quite high (r = -1.00). In sum, I do find statistical support for Sklar et al.’s findings.